Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.
translated by 谷歌翻译
Marine waves significantly disturb the unmanned surface vehicle (USV) motion. An unmanned aerial vehicle (UAV) can hardly land on a USV that undergoes irregular motion. An oversized landing platform is usually necessary to guarantee the landing safety, which limits the number of UAVs that can be carried. We propose a landing system assisted by tether and robot manipulation. The system can land multiple UAVs without increasing the USV's size. An MPC controller stabilizes the end-effector and tracks the UAVs, and an adaptive estimator addresses the disturbance caused by the base motion. The working strategy of the system is designed to plan the motion of each device. We have validated the manipulator controller through simulations and well-controlled indoor experiments. During the field tests, the proposed system caught and placed the UAVs when the disturbed USV roll range was approximately 12 degrees.
translated by 谷歌翻译
Image-based head swapping task aims to stitch a source head to another source body flawlessly. This seldom-studied task faces two major challenges: 1) Preserving the head and body from various sources while generating a seamless transition region. 2) No paired head swapping dataset and benchmark so far. In this paper, we propose an image-based head swapping framework (HS-Diffusion) which consists of a semantic-guided latent diffusion model (SG-LDM) and a semantic layout generator. We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator, achieving a coarse-grained head swapping. SG-LDM can further implement fine-grained head swapping with the blended layout as condition by a progressive fusion process, while preserving source head and source body with high-quality reconstruction. To this end, we design a head-cover augmentation strategy for training and a neck alignment trick for geometric realism. Importantly, we construct a new image-based head swapping benchmark and propose two tailor-designed metrics (Mask-FID and Focal-FID). Extensive experiments demonstrate the superiority of our framework. The code will be available: https://github.com/qinghew/HS-Diffusion.
translated by 谷歌翻译
AI的创作(例如诗歌或歌词产生)吸引了行业和学术社区的越来越多的关注,在过去的几年中,许多有前途的模型提出了许多有前途的模型。现有方法通常基于单个和独立的视觉或文本信息估算输出。但是,实际上,人类通常会根据自己的经验进行创作,这可能涉及不同的方式并依次相关。为了模拟这种人类能力,在本文中,我们根据人类的经验来定义和解决一个新颖的AI创建问题。更具体地说,我们研究了如何基于顺序多模式信息生成文本。与以前的作品相比,此任务要困难得多,因为设计的模型必须很好地理解和适应不同模式之间的语义,并以顺序的方式有效地将其转化为输出。为了减轻这些困难,我们首先设计了配备有多模式注意力网络的多通道序列到序列体系结构。为了获得更有效的优化,我们然后提出了针对顺序输入量身定制的课程负抽样策略。为了基准这个问题并证明我们的模型的有效性,我们手动标记了一个新的多模式体验数据集。使用该数据集,我们通过将模型与一系列代表性基线进行比较,进行了广泛的实验,我们可以基于自动和以人为中心的指标来证明模型的显着改进。代码和数据可在:\ url {https://github.com/aman-4-real/mmtg}中获得。
translated by 谷歌翻译
卷积神经网络(CNN)的违反直觉性能是它们对对抗性示例的固有敏感性,这严重阻碍了CNN在安全至关重要的领域中的应用。对抗性示例类似于原始示例,但包含恶意扰动。对抗训练是一种简单有效的训练方法,可以提高CNN对对抗性例子的鲁棒性。对抗性实例和对抗训练的机制值得探索。因此,这项工作通过观察相互信息的趋势来研究信息提取中两种类型的CNN(正常和强大)之间的相似性和差异。我们表明,1)CNN从原始和对抗性示例中提取的CNN的互助数量几乎相似,无论CNN是在正常训练中还是对抗性训练;对抗性示例误导CNN的原因可能是它们包含有关其他类别的更多基于纹理的信息; 2)与正常训练相比,对抗训练更加困难,并且强大的CNN提取的信息量较小; 3)接受不同方法训练的CNN对某些类型的信息具有不同的偏好;通常,受过训练的CNN倾向于从输入中提取基于纹理的信息,而受对抗训练的模型则喜欢基于基于基于的信息。此外,我们还分析了这项工作中使用的共同信息估计器,内核密度估计和固定方法,并发现这些估计器在一定程度上概述了中间层输出的几何特性。
translated by 谷歌翻译
知识图形嵌入(KGE)旨在学习实体和关系的陈述。大多数KGE模型取得了巨大的成功,特别是在外推情景中。具体地,考虑到看不见的三倍(H,R,T),培训的模型仍然可以正确地预测(H,R,Δ)或H(Δ,r,t),这种外推能力令人印象深刻。但是,大多数现有的KGE工作侧重于设计精致三重建模功能,主要告诉我们如何衡量观察三元的合理性,但是对为什么可以推断到未看见数据的原因有限的解释,以及什么是重要因素帮助Kge外推。因此,在这项工作中,我们试图研究kge外推两个问题:1。凯格如何推断出看看的数据? 2.如何设计KGE模型,具有更好的外推能力?对于问题1,我们首先分别讨论外推和关系,实体和三级的影响因素,提出了三种语义证据(SES),可以从列车集中观察,并为推断提供重要的语义信息。然后我们通过对几种典型KGE方法的广泛实验验证SES的有效性。对于问题2,为了更好地利用三个级别的SE,我们提出了一种新的基于GNN的KGE模型,称为语义证据意识图形神经网络(SE-GNN)。在SE-GNN中,每个级别的SE由相应的邻居图案明确地建模,并且通过多层聚合充分合并,这有助于获得更多外推知识表示。最后,通过对FB15K-237和WN18RR数据集的广泛实验,我们认为SE-GNN在知识图表完成任务上实现了最先进的性能,并执行更好的外推能力。
translated by 谷歌翻译
张量分解是降低维数和特征多维数据(例如信号)的功能解释的强大工具。现有的张量分解目标(例如Frobenius Norm)旨在根据统计假设拟合原始数据,这可能与下游分类任务不符。在实践中,原始输入张量可以包含无关的信息,而数据增强技术可用于平滑样品中的类近差噪声。本文通过提出增强张量分解(ATD)来解决上述挑战,该张力分解(ATD)有效地纳入了数据增强和自欺欺人的学习(SSL)以增强下游分类。为了解决新的增强目标的非凸度,我们开发了一种迭代方法,使优化能够遵循交替的最小二乘(ALS)时尚。我们在多个数据集上评估了我们的ATD。与基于张量的基准相比,它可以实现0.8%-2.5%的准确性增益。此外,我们的ATD模型在自我监督和自动编码器基准的情况下显示出可比或更好的性能(例如,准确性高达15%),同时使用这些基线模型的少于5%的可学习参数
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译
Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.
translated by 谷歌翻译
Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.
translated by 谷歌翻译